23:14
2026-05-30
lesswrong.com
large-language-models
How's it going? Reinforcement learning in language models recruits a functional welfare axis
Researchers at NYU, in collaboration with David Chalmers and Pavel Izmailov, have discovered that reinforcement learning in language models recruits a functional "welfare axis" β a concept vector thatβ¦